A meta collection of some R Markdown strategies.
If you haven’t read it already make sure to read Dr. Alison Hill’s fantastic blogpost:
How I teach R Markdown
Alison is a RMarkdown superstar on the RStudio Education team. Her blogpost covers her guide on her well-informed approach for teaching R Markdown.
She has taught:
- College students as a professor across a semester
- In person professional learners at RStudio::conf in 1-2 day workshops
- Digital Learners in Pharma/Finance/etc via shorter online workshops
To summarize her post:
Again - GO READ her blogpost for additional links and guides she links to.
My blogpost below is meant to be a sister article to hers, framed with a similar approach we use in Customer Success but different in that we’re not doing as much long-form education. Alison’s approach is well-informed and very useful in the context of direct teaching activity, which is why I wanted to share it as well!
I work on a different team than Alison at RStudio, specifically I’m a Customer Success Manager. This means that I work with existing RStudio Pro Product customers, most often people who have RStudio Connect. I work exclusively with High Tech/Software customers, meaning that they are typically already doing very sophisticated work with R in production, and I’m helping them further eliminate friction or empower their data science teams to do more with R.
A core part of my job is knowledge sharing around how to use open-source software like R Markdown with or without our Pro Products. Thus most of my work is Strategic in nature, although I do often give shorter 30-60 min training sessions that are Tactical.
A strategy is a set of guidelines used to achieve an overall objective, whereas tactics are the specific actions aimed at adhering to those guidelines. Source: Wikipedia
Thus my usual framing is covering topics that inform the learner of new strategies (ways of solving a problem) without necessary having to teach all the tactics (nuts and bolts of how it all works).
This post will focus on 4 core strategies of why R Markdown is SO useful and absolutely worth learning with links to external tactics/guides/write-ups of how to accomplish the various tasks.
Goal: Capture code, text/comments, and output in a single document
This is the most common use of R Markdown, and is often how it is taught in University coursework. R Markdown is a tool for Literate Programming, and in summary is:
A programming paradigm introduced by Donald Knuth in which a computer program is given an explanation of its logic in a natural language, such as English, interspersed with snippets of macros and traditional source code, from which compilable source code can be generated.
Whether you talk about Minimum Viable Product or Most Valuable Player, it works! Since R Markdown is a form of Literate Programming, you can write all of your comments, notes, and execute your code within it.
diffable and easily human readable in version controlAn example here is for Dave Robinson’s #TidyTuesday screencasts + code
Goal: Generate output natively in R for consumption
This is typically the second most common use of R Markdown. Since R Markdown can knit to all sorts of different formats, it is a powerful tool for creating data products like:
xaringan (remark.js)flexashboardblogdown for easily extensible custom websites or blogsdistill for scientific writing, native to the web (this website is built in distill)Most importantly these formats are created with code, so you get the benefit of reproducibility, automation, etc while still generating data products in the format your non-coder colleagues expect.
Goal: Scale data science tasks, automate the boring stuff, create robust pipelines
Less widely known, but just as important is the idea of R Markdown as a meta-document that lets you bring in other code or automate processes.
As it’s much larger in scope than a single bullet point I’d recommend going to read Emily Riederer’s blog post on Rmarkdown Driven Development. It’s “an approach of using R Markdown within the larger scope of the analysis engineering concept” presented by Hilary Parker.
A brief summary of her blogpost:
I tend to think of each RMarkdown as having a “data product” (an analytical engine calibrated to answer some specific question) nestled within it. Surfacing this tool just requires a touch of forethought before beginning an analysis and a bit of clean-up afterwards.
In this post, I describe RMarkdown Driven Development: a progression of stages between a single ad-hoc RMarkdown script and more advanced and reusable data products like R projects and packages. This approach has numerous benefits.
state to the R Markdown report and render 50x reports at once!blastulablastula provides a framework to generate HMTL emails from R Markdown, which are then sent by an email server or RStudio Connect.
R Markdown is a first-class citizen on RStudio Connect, and you can interactively generate new reports based on parameters, or schedule R Markdown documents to re-execute documents on a schedule.
Code for ETL - an example of an ETL process through an automated R Markdown report, this could query against a SQL database or a spark cluster to process ETL jobs, all on a schedule down to the minute or up to a year.
Scheduled reporting - maybe your boss needs a report built every Monday? You can do that too - pulling in new data and re-generating a report on a specific time-schedule all with no need for human intervention.
Emailing w/ blastula - maybe your boss is too busy to consume a full report every day - send a conditional email directly to them if a specific number is hit or missed all with code in R! This email could is built with R Markdown, and could contain plots, tables, raw data, or attach ANY R Markdown-based document (so… basically anything).
Goal: Don’t repeat yourself, generate input templates or output documents from code.
Using R Markdown for templating is normally thought of for knitr::render() + parameters, but there’s additional techniques to solve specific problems that don’t fit neatly into paramaterized reports as well.
knit::render()purrr::walk() to generate new outputs from a template within the R Markdown report.results="asis" in the chunk option.Minimal example below with the palmerpenguins dataset. I’ve included the code as an image placed below, as I’m essentially nesting R Markdown chunks inside R Markdown chunks in a R Markdown-based website. Full copy-pastable code at: https://git.io/JJBcC.
Note that I’m writing one function and calling it n times, it would loop across all the data based on the different inputs.
Which generates the following document:
whisker is a templating engine for R conforming to the Mustache specification.glue style syntax to add data to templates either in memory or to an output file, where my mental model is it is glue for documents rather than strings.inst directory of an R package.Minimal whisker example below:
First, some input data:
data <- list(
name = "Chris",
value = 10000,
taxed_value = 10000 - (10000 * 0.4),
in_ca = TRUE
)
Then a template:
template <-
'Hello {{name}}
You have just won ${{value}}!
{{#in_ca}}
Well, ${{taxed_value}}, after taxes.
{{/in_ca}}'
Now, fill the template!
text <- whisker.render(template, data)
cat(text)
# Output
Hello Chris
You have just won $10000!
Well, $6000, after taxes.
I use whisker natively to generate the readme files for each week’s #TidyTuesday submission. Separate blog-post to come for that!
usethis::use_template()whisker usethis::use_template() provides a more ready to use function, and uses whisker internally.
use_template() Used as the engine for a templating function in other packages.Sharla Gelfand, the “Queen of Reproducible Reporting”, put together lots of material using the usethis::use_template() workflow in their work.
So that’s an overview of my approach to sharing knowledge around R Markdown, and like Alison said:
But remember: there is no one way to learn R Markdown, and no one way to teach it either. I love seeing the creativity of the community when introducing the R Markdown family - so keep them coming!